Rethinking Full-Text Search for Multi-lingual Databases

نویسندگان

  • Jeffrey S. Sorensen
  • Salim Roukos
چکیده

Textual fields are commonly used in databases and applications to capture details that are difficult to formalize—comments, notes, and product descriptions. With the rise of the web, users expect that databases be capable of searching these fields quickly and accurately in their native language. Fortunately, most modern database systems provide some form of full-text indexing of free text fields. However, these capabilities have yet to be combined with the simultaneous demand that databases provide support for world languages. In this paper we introduce several of the challenges for handling multilingual data and introduce a solution based on an architecture that enables flexible processing of texts based upon the properties of each text’s source language. Extending the indexing architecture, and standardizing the query capabilities, are important steps to creating the applications that will serve world markets.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Investigation on Full-Text Databases Cited in LIS

Background and Aim: The main objective of this research was to investigate the use of full-text databases in the LIS theses of Tehran State Universities within the years 2005 and 2009. Method: For this purpose, the total of 9952 citations related to 172 existing theses in the academic central libraries were studied. The data collected were analyzed by the bibliometrics and citation analysis met...

متن کامل

A Pattern Matching Approach for Redundancy Detection in Bi-lingual and Mono-lingual Corpora

---The Bi-Lingual and Mono-Lingual Corpora Information relating to numerous Languages may be duplicated. This leads to slow and inaccurate search results from Bi-Lingual and Mono-Lingual databases. It is essential to structure the Sequences in a fashion that reduces the redundant sequence structure so that the analysis of BiLingual and Mono-Lingual Corpora structure is accurate to help in analy...

متن کامل

SPOT: TRW'S Multi-Lingual, Text Search Tool

TRW has developed a tex t sea rch tool t h a t allows users to en ter a query in foreign languages and retrieve documents that match the query. A single query can contain words and phrases in a mix of different languages, with the foreign-language terms entered using the native script. The browser also displays the original document in its native script. Key terms in the browser display are hig...

متن کامل

A Semi-Automatically Enriched Multi-Lingual Terminology in Commercial Products

One way to exploit the CLEF-ER challenge results is to semi-automatically enrich the multi-lingual terminology provided to the CLEF-ER participants. In the current version, English is the predominant language (1.8 m synonyms in 531k concepts). Synonyms in other languages are clearly underrepresented (Spanish: 643k, French: 127k, German: 119k and Dutch: 116k). Two leading text mining companies, ...

متن کامل

A Method for Evaluating Full-text Search Queries in Native XML Databases

In this paper we consider the problem of efficiently producing results for full-text keyword search queries over XML documents. We describe full-text search query semantics and propose a method for efficient evaluation of keyword search queries with these semantics suitable for native XML databases. Method uses inverted file index which may be efficiently updated when a part of some XML documen...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • IEEE Data Eng. Bull.

دوره 30  شماره 

صفحات  -

تاریخ انتشار 2007